141 research outputs found

    RNA-RNA interaction prediction based on multiple sequence alignments

    Full text link
    Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, O(N6)O(N^6) time and O(N4)O(N^4) space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{[email protected]}Comment: 8 pages, 9 figure

    The Role of Non-Coding RNAs in the Human Placenta

    Get PDF
    Non-coding RNAs (ncRNAs) play a central and regulatory role in almost all cells, organs, and species, which has been broadly recognized since the human ENCODE project and several other genome projects. Nevertheless, a small fraction of ncRNAs have been identified, and in the placenta they have been investigated very marginally. To date, most examples of ncRNAs which have been identified to be specific for fetal tissues, including placenta, are members of the group of microRNAs (miRNAs). Due to their quantity, it can be expected that the fairly larger group of other ncRNAs exerts far stronger effects than miRNAs. The syncytiotrophoblast of fetal origin forms the interface between fetus and mother, and releases permanently extracellular vesicles (EVs) into the maternal circulation which contain fetal proteins and RNA, including ncRNA, for communication with neighboring and distant maternal cells. Disorders of ncRNA in placental tissue, especially in trophoblast cells, and in EVs seem to be involved in pregnancy disorders, potentially as a cause or consequence. This review summarizes the current knowledge on placental ncRNA, their transport in EVs, and their involvement and pregnancy pathologies, as well as their potential for novel diagnostic tools

    Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures

    Get PDF
    BACKGROUND: Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains. RESULT: Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naïve implementation would yield recursions with a very high time complexity of O(n(6)D(5)) for sequence length n and D distinct distance values, it is possible to reduce this to O(n(4)) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules. CONCLUSIONS: The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html

    U7 snRNAs

    Get PDF
    U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database

    Finding approximate gene clusters with GECKO 3

    Get PDF
    Winter S, Jahn K, Wehner S, et al. Finding approximate gene clusters with GECKO 3. Nucleic Acids Research. 2016;44(20):9600-9610.Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min

    A marine chlamydomonas sp. emerging as an algal model

    Get PDF
    The freshwater microalga Chlamydomonas reinhardtii , which lives in wet soil, has served for decades as a model for numerous biological processes, and many tools have been introduced for this organism. Here, we have established a stable nuclear transformation for its marine counterpart, Chlamydomonas sp. SAG25.89, by fusing specific cis ‐acting elements from its Actin gene with the gene providing hygromycin resistance and using an elaborated electroporation protocol. Like C. reinhardtii , Chlamydomonas sp. has a high GC content, allowing reporter genes and selection markers to be applicable in both organisms. Chlamydomonas sp. grows purely photoautotrophically and requires ammonia as a nitrogen source because its nuclear genome lacks some of the genes required for nitrogen metabolism. Interestingly, it can grow well under both low and very high salinities (up to 50 g · L ‐1 ) rendering it as a model for osmotolerance. We further show that Chlamydomonas sp. grows well from 15 to 28°C, but halts its growth at 32°C. The genome of Chlamydomonas sp. contains some gene homologs the expression of which is regulated according to the ambient temperatures and/or confer thermal acclimation in C. reinhardtii . Thus, knowledge of temperature acclimation can now be compared to the marine species. Furthermore, Chlamydomonas sp. can serve as a model for studying marine microbial interactions and for comparing mechanisms in freshwater and marine environments. Chlamydomonas sp. was previously shown to be immobilized rapidly by a cyclic lipopeptide secreted from the antagonistic bacterium Pseudomonas protegens PF‐5, which deflagellates C. reinhardtii

    Detection of very long antisense transcripts by whole transcriptome RNA-Seq analysis of Listeria monocytogenes by semiconductor sequencing technology

    Get PDF
    The Gram-positive bacterium Listeria monocytogenes&nbsp;is the causative agent of listeriosis, a severe food-borne infection characterised by abortion, septicaemia, or meningoencephalitis. L. monocytogenes&nbsp;causes outbreaks of febrile gastroenteritis and accounts for community-acquired bacterial meningitis in humans. Listeriosis has one of the highest mortality rates (up to 30%) of all food-borne infections. This human pathogenic bacterium is an important model organism for biomedical research to investigate cell-mediated immunity. L. monocytogenes&nbsp;is also one of the best characterised bacterial systems for the molecular analysis of intracellular parasitism. Recently several transcriptomic studies have also made the ubiquitous distributed bacterium as a model to understand mechanisms of gene regulation from the environment to the infected host on the level of mRNA and non-coding RNAs (ncRNAs). We have used semiconductor sequencing technology for RNA-seq to investigate the repertoire of listerial ncRNAs under extra- and intracellular growth conditions. Furthermore, we applied a new bioinformatic analysis pipeline for detection, comparative genomics and structural conservation to identify ncRNAs. With this work, in total, 741 ncRNA locations of potential ncRNA candidates are now known for L. monocytogenes, of which 611 ncRNA candidates were identified by RNA-seq. 441 transcribed ncRNAs have never been described before. Among these, we identified novel long non-coding antisense RNAs with a length of up to 5,400 nt e.g. opposite to genes coding for internalins, methylases or a high-affinity potassium uptake system, namely the kdpABC&nbsp;operon, which were confirmed by qRT-PCR analysis. RNA-seq, comparative genomics and structural conservation of L. monocytogenes&nbsp;ncRNAs illustrate that this human pathogen uses a large number and repertoire of ncRNA including novel long antisense RNAs, which could be important for intracellular survival within the infected eukaryotic host

    Challenges in RNA virus bioinformatics

    Get PDF
    Motivation: Computer-assisted studies of structure, function and evolution of viruses remains a neglected area of research. The attention of bioinformaticians to this interesting and challenging field is far from commensurate with its medical and biotechnological importance. It is telling that out of >200 talks held at ISMB 2013, the largest international bioinformatics conference, only one presentation explicitly dealt with viruses. In contrast to many broad, established and well-organized bioinformatics communities (e.g. structural genomics, ontologies, next-generation sequencing, expression analysis), research groups focusing on viruses can probably be counted on the fingers of two hands. Results: The purpose of this review is to increase awareness among bioinformatics researchers about the pressing needs and unsolved problems of computational virology. We focus primarily on RNA viruses that pose problems to many standard bioinformatics analyses owing to their compact genome organization, fast mutation rate and low evolutionary conservation. We provide an overview of tools and algorithms for handling viral sequencing data, detecting functionally important RNA structures, classifying viral proteins into families and investigating the origin and evolution of viruses. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. The references for this article can be found in the Supplementary Materia

    Homology-based annotation of non-coding RNAs in the genomes of Schistosoma mansoni and Schistosoma japonicum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Schistosomes are trematode parasites of the phylum Platyhelminthes. They are considered the most important of the human helminth parasites in terms of morbidity and mortality. Draft genome sequences are now available for <it>Schistosoma mansoni </it>and <it>Schistosoma japonicum</it>. Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and defense, homeostasis, and pathogenesis. The genome-wide annotation of ncRNAs is a non-trivial task unless well-annotated genomes of closely related species are already available.</p> <p>Results</p> <p>A homology search for structured ncRNA in the genome of <it>S. mansoni </it>resulted in 23 types of ncRNAs with conserved primary and secondary structure. Among these, we identified rRNA, snRNA, SL RNA, SRP, tRNAs and RNase P, and also possibly MRP and 7SK RNAs. In addition, we confirmed five miRNAs that have recently been reported in <it>S. japonicum </it>and found two additional homologs of known miRNAs. The tRNA complement of <it>S. mansoni </it>is comparable to that of the free-living planarian <it>Schmidtea mediterranea</it>, although for some amino acids differences of more than a factor of two are observed: Leu, Ser, and His are overrepresented, while Cys, Meth, and Ile are underrepresented in <it>S. mansoni</it>. On the other hand, the number of tRNAs in the genome of <it>S. japonicum </it>is reduced by more than a factor of four. Both schistosomes have a complete set of minor spliceosomal snRNAs. Several ncRNAs that are expected to exist in the <it>S. mansoni </it>genome were not found, among them the telomerase RNA, vault RNAs, and Y RNAs.</p> <p>Conclusion</p> <p>The ncRNA sequences and structures presented here represent the most complete dataset of ncRNA from any lophotrochozoan reported so far. This data set provides an important reference for further analysis of the genomes of schistosomes and indeed eukaryotic genomes at large.</p

    Assembling highly repetitive Xanthomonas TALomes using Oxford Nanopore sequencing

    Get PDF
    Background: Most plant-pathogenic Xanthomonas bacteria harbor transcription activator-like effector (TALE) genes, which function as transcriptional activators of host plant genes and support infection. The entire repertoire of up to 29 TALE genes of a Xanthomonas strain is also referred to as TALome. The DNA-binding domain of TALEs is comprised of highly conserved repeats and TALE genes often occur in gene clusters, which precludes the assembly of TALE-carrying Xanthomonas genomes based on standard sequencing approaches. Results: Here, we report the successful assembly of the 5 Mbp genomes of five Xanthomonas strains from Oxford Nanopore Technologies (ONT) sequencing data. For one of these strains, Xanthomonas oryzae pv. oryzae (Xoo) PXO35, we illustrate why Illumina short reads and longer PacBio reads are insufficient to fully resolve the genome. While ONT reads are perfectly suited to yield highly contiguous genomes, they suffer from a specific error profile within homopolymers. To still yield complete and correct TALomes from ONT assemblies, we present a computational correction pipeline specifically tailored to TALE genes, which yields at least comparable accuracy as Illumina-based polishing. We further systematically assess the ONT-based pipeline for its multiplexing capacity and find that, combined with computational correction, the complete TALome of Xoo PXO35 could have been reconstructed from less than 20,000 ONT reads. Conclusions: Our results indicate that multiplexed ONT sequencing combined with a computational correction of TALE genes constitutes a highly capable tool for characterizing the TALomes of huge collections of Xanthomonas strains in the future
    corecore